total db lockup - Mailing list pgsql-general

From
Subject total db lockup
Date
Msg-id 20050818132430.SUFW1586.tomts42-srv.bellnexxia.net@[209.226.175.82]
Whole thread Raw
List pgsql-general
(NOTE: reposting this for the *fifth* time because my previous messages didn't go through).


Hi all,

We have experienced a really weird problem with
postgresql yesterday. When I was called in to take a
look, all the non-superuser connections were used up
and they were all in a waiting state (SELECT waiting,
UPDATE waiting, etc.). I couldn't figure out what the
problem is, but I saved the ps, pg_stat_activity, and
pg_locks state at the time (attached as
condor_db_stats.txt). BTW, when looking at the queries
in pg_stat_activity, postgresql cuts them off so you
can't see the entire string. Is there any way to
prevent that, or at least increase the character
limit?

We restarted postgresql server and the problem
reappeared a few minutes later. I saved that state
also (condor_db_stats2.txt). At that point, after we
restarted postgresql (again), I ran VACUUM on the
entire database and did a few more things that seem to
have solved the problem (see below).

This particular database is essentially just one flat
table (level) with a few small supporting tables. Only
the level table is heavily used. (table definition is
attached as table.txt). I noticed that one of the
indexes (level_owner_index) was a hash index. I
remembered what postgresql manual says about hash
indexes and concurrency
(http://www.postgresql.org/docs/7.4/interactive/locking-indexes.html)
and, after VACUUM finished, replaced the hash index
with a btree. I then did a REINDEX of the level table
and ANALYZE. This seems to have solved the problem --
at least as of this morning we still have not seen any
deadlocks.

My question is, what could have caused this to happen?
Can anyone explain this paragraph from the manual:

"Share/exclusive page-level locks are used for
read/write access. Locks are released after the page
is processed. Page-level locks provide better
concurrency than index-level ones but are liable to
deadlocks."


Any other pointers to help me figure out what went
wrong and how to fix it?

thanks,

Eugene

WTF? My message doesn't appear. Trying again without
attachments or inline text.


pgsql-general by date:

Previous
From: Michael Fuhr
Date:
Subject: Re: Weird performance hit
Next
From: "Michel Bouchard"
Date:
Subject: